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Abstract 

Genetic linkage maps are indispensable tools in genetic and genomic studies. Recent development of 
genotyping-by-sequencing (GBS) methods holds great promise for constructing high-resolution linltage 
maps in organisms laclting extensive genomic resources. In the present study, linkage mapping was conducted 
for a bivalve mollusc {Chtamysfarreri) using a newly developed GBS method— 2 b-restriction site-associated 
DNA (2b-RAD). Genome survey sequencing was performed to generate a preliminary reference genome that 
was utilized to facilitate linkage and quantitative trait locus (QTL) mapping in C.farreri. A high-resolution 
linkage map was constructed with a marker density (3806) that has, to our knowledge, never been achieved 
in any other molluscs. The linkage map covered nearly the whole genome (99.5%) with a resolution of 
0.41 cM. QTL mapping and association analysis congruously revealed two growth-related QTLs and one po- 
tential sex-determination region. An important candidate QTL gene named which functions in the 
regulation of growth hormone production in vertebrates, was identified from the growth-related QTL 
region detected on the linkage group LG3. We demonstrate that this linkage map can serve as an important 
platform for improving genome assembly and unifying multiple genomic resources. Our study, therefore, ex- 
emplifies how to build up an integrative genomic framework in a non-model organism. 
Keywords: bivalve; genome sequencing; 2 b-RADgenotyping; linkage mapping; quantitative trait locus mapping 



1 . Introduction 

High-resolution linkage maps are exceptionally 
valuable tools in many genetic and genomic applica- 
tions, such as fine-scale quantitative trait locus (QTL) 
mapping, characterization of recombination hotspots, 
comparative genome analysis and genome scaffolding. 



"•"These authors contributed equally to this work. 



A key prerequisite for constructing a high-resolution 
linkage map is the availability of a large number of 
genetic markers (preferably, single-nucleotide poly- 
morphisms— SNPs), which has been unfortunately out 
of reach for most non-model organisms until just a 
few years ago due to the huge investments in large- 
scale marker discovery and array-based genotyping. 
Recent advances in the next-generation sequencing 
technologies have revolutionized our ability to obtain 
massive genomic resources in a rapid and cost-effective 
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manner and, therefore, greatly stimulate the develop- 
ment of a variety of genotyping-by-sequencing (GBS) 
methods.' Among available GBS methods, restriction 
site-associated DNA (RAD) has gained popularity for 
the construction of high-density linkage maps,' and 
several methods with simpler RAD library preparation 
protocols have been developed. ^'^ For example, Wang 
et al.^ have recently developed the 2b-RAD method 
by adopting Type MB restriction enzymes to simplify 
RAD library preparation. The streamlined 2b-RAD 
method features even and tunable genome coverage, 
providing a flexible genotyping platform to diminish 
the extensive sequencing effort required for species 
with large genomes. 

Bivalves comprise ~1 4 OOOspeciesworldwide,'^ con- 
stituting the second largest group of molluscs. Many 
bivalves are of economic importance and support 
both commercial fisheries and mariculture efforts. 
They make up an important source of food all over 
the world, with a production of over 11.7 million 
metric tons in 2008, corresponding to 22% of the 
global aquaculture production.^ Elucidation of genetic 
bases underlying economically important traits, such 
as growth and reproduction, has been a central task in 
bivalve genetic and breeding studies for the purpose 
of genetic improvement. Linkage maps have been con- 
structed for many bivalve species with the aim for QTL 
mapping of important traits. However, the resolution 
of these maps is generally low (mostly 10-20cM), 
thus limiting their usefulness in determining quantita- 
tive trait genes. Great progresses have been recently 
achieved by transcriptome sequencing for large-scale 
identification of candidate genes involved in bivalve 
growth and reproduction. Polymorphisms in 
some candidate genes have also been associated with 
the variation of bivalve growth traits."''^ Bivalves 
exhibit fascinatingly diverse sexual patterning. Some 
species are dioecious, whereas some are simultaneous 
hermaphrodites with a few as protandrous hermaphro- 
dites (male when young, then later become female) or 
proterogynous hermaphrodites (female when young, 
then later become male). Currently, the mechanism 
of sex determination remains poorly understood in 
bivalves. Some recent studies have associated the sex 
determination of bivalves with doubly uniparental in- 
heritance (DUI) of mitochondria' °'' ^ and a hypothetic- 
al model has been proposed.' ° However, the generality 
of the proposed model is constrained because DUI is a 
peculiar mtDNA inheritance system that exists only in 
a few bivalve species. 

The scallop Chlamys farreri (Jones et Preston 1 904) 
naturally distributes along the seacoasts of China, 
Japan and Korea, and is a commercially important 
bivalve species in China. With increasing acreage 
devoted to artificial farming, several issues such as de- 
clining production and frequent disease outbreaks 
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have raised great concern in modern scallop aquacul- 
ture. Currently, genetic studies focusing on scallop 
growth, reproduction and immunity represent active 
research directions. Genetic and genomic resources 
for C. farreri have rapidly expanded, including low- 
density linkage,' '^~' ^ physical'^ and cytogenetic 
maps,'^ fosmid and bacterial artificial chromosome 
(BAC) libraries,^°'^' a large number of expressed se- 
quence tags.^'^^ The direction of the present study 
was also shaped by our immediate need for building 
up an integrative genomic platform that unifies a 
variety of genomic resources, such as genome map, 
linkage map and physical map, and facilitates future 
genetic, genomic and breeding studies on C. farreri. 



2. Materials and methods 

2.1 . Genome survey sequencing and gene annotation 

A two-year-old C. farreri individual was chosen for 
genome survey sequencing. Genomic DNA was ex- 
tracted from its adductor muscle using the standard 
phenol/chloroform extraction method. Three paired- 
end DNA libraries with insert sizes of 1 75, 500 and 
800 bp were constructed by following the lllumina's 
standard DNA library preparation protocol and were 
then sequenced usingthelllumina HiSeq2 000 platform. 

Raw reads were first filtered to remove low-quality 
reads resulting from base-calling duplications or adapter 
contamination. Clean reads were assembled using the 
SOAPdenovo software, which applies the de Bruijn graph 
structure to construct contigs. Subsequently, all reads 
were realigned tothe contigs and the paired-end informa- 
tion was used to join unique contigs into scaffolds. Finally, 
intra-scaffold gaps were filled using paired-end extracted 
reads that had one read uniquely aligned on a contig 
and another read located in a gap region. Genomic 
sequences were archived in the Sequence Read Archive 
(SRA) database (accession no. SRPOl 8107). 

Our group has recently reported the most compre- 
hensive transcriptome resource for C. farreri by 454 se- 
quencing of cDNA libraries generated from diverse 
developmental stages and adult tissues,^ which 
provide an excellent basis for gene annotation of the 
obtained scaffolds. Genomic scaffolds were first anno- 
tated by aligning the 2 0 056 transcriptomic isogroup 
sequences (i.e. the longest isotig in each isogroup) 
onto these scaffolds using the GMAP program.^'' To in- 
crease the annotation rate, the unannotated scaffolds 
were compared against the Swiss-Prot and non-redun- 
dant (NR) protein databases using BlastX, with an 
E-value threshold of 1 £-4. For scaffolds without sig- 
nificant matches, tBiastX search with an £-value 
threshold of 1 £-4 was conducted against the NR nu- 
cleotide database for further annotation. To increase 
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computational speed, all Blast searches were limited to 
the top 1 0 significant hits for each query. 

2.2. Mapping families 

Twenty full-sib families were established in May 
2009 by single-pair mating of 'Penglai-Red' scallops, 
an elite C./flrren strain developed byour group with fea- 
tures of red shell, fast growth and disease resistance. 
Growth -related traits including shell length, shell 
width, shell height and body weight were measured 
for all families at the age of 1 5 months. One of these 
families exhibiting high within-family variation of 
growth traits was chosen for linkage and QTL analysis. 

2.3. 2 b-RAD sequencing 

2b-RAD libraries were prepared for two parents and 
96 progenies by following the protocol developed by 
Wang et al.^ For the parents, standard BsaXI libraries 
were constructed, whereas for the progenies, reduced 
representation (RR) libraries were constructed using 
adaptors with 5'-NNT-3' overhangs to target a subset 
of all BsaXI fragments in the C. farreri genome. 
A unique barcode was incorporated into each library 
during library preparation, and then all libraries were 
pooled for single-end sequencing (1 x 50 bp) using 
an lllumina GA-II sequencer. All the 2 b-RAD sequences 
were archived in the SRA database (accession no. 
SRA065207). 

2.4. Sequence data preprocessing and de novo 
genotyping 

Raw reads were first trimmed to remove adaptor 
sequences. The terminal 3-bp positions were excluded 
from each read to eliminate artefacts that might have 
arisen at ligation sites. Reads with no restriction sites 
or containing ambiguous base calls (N), long homopo- 
lymer regions (>10 bp), excessive numbers of low- 
quality positions (>5 positions with quality of <10) 
or mitochondrial origins were removed. The remaining 
trimmed, high-quality readsformed the basisforsubse- 
quent analysis. 

De novo 2 b-RAD genotyping was performed usi ng the 
RADtyping program vl.O (http://www2.ouc.edu.cn/ 
mollusl</detailen.asp?ld=72 7) under default para- 
meters. The RADtyping program has recently been 
developed by our group, and it is an integrated pipeline 
that enables both accurate de novo codominant and 
dominant genotyping in mapping populations. 

2.5. Linl<age map construction 

Segregating markers that could be genotyped in at 
least 80% of the individuals were considered as high- 
quality markers and were retained for further analysis. 
For each segregating marker, goodness of fit of the 
observed with the expected Mendelian ratio was 



assessed using the test, and those markers conform- 
ing to the expected Mendelian ratios (P > 0.05) were 
included in map construction. Linkage ana lysis was per- 
formed using the JoinMap 4.0 software.^^ Sex-specific 
maps were first constructed for each parent. Maternal 
and paternal datasets were created using the function 
'Create Maternal and Paternal Population Nodes' in 
the JoinMap program, which was also used to partition 
1 :2:1 -typedata into 1 :1 female-type and 1 :1 male-type 
data. A logarithm of odds (LOD) threshold of at least 6.0 
was used to assign markers into linkage groups. The re- 
gression mapping algorithm was selected for map con- 
struction. The Kosambi mapping function was used 
to convert the recombination frequencies into map 
distances (centiMorgans). A consensus map was estab- 
lished by integrating the sex-specific maps through 
shared markers using the MergeMap software.^^ 

2.6. QTL mapping and association analyses 
of growth traits and sex 

The distributions of growth traits were first assessed 
using the univariate procedure of MATLAB software to 
check whether they followed normal distributions. 
Pearson correlations among four growth traits were cal- 
culated using pairwise comparison. 

The regression method was used for QTL mapping of 
both growth traits and sex. Here,sexwasusedasa binary 
trait, supposing the haplotypes of the male and female 
parentsare Hpp, Hp^, HmpSnd Hmm)thatflf, ~c/f,flrnand 
-flrn are their respective effects on the trait of interest, 
and Ppp, Ppm, Pmp and Pmm are the respective individual 
inheritance probabilities at putative positions in the cor- 
responding parental haplotypes. Following the above 
assumption, Ppp + Ppm = Pmp + Pmm = 1 ■ Therefore, the 
hypothesis to detect each QTL is: 

y = |X + S + (Ppp - Ppm)flf + (Pmp + fmm)flm + e (1 ) 

and its null hypothesis is: 

y=\L + s + e (2) 

whereyisthe observed va I ue of the tra it of i nterest; jx, the 
population mean; s, the fixed effect of sex and e, the 
random error. 

For each chromosome, midpoints of all marker 
brackets along the chromosome were considered to 
be putative QTL positions, and the inheritance prob- 
abilities for individuals at each position were calculated 
using the IBD program^'' and the corresponding LOD 
value was determined by: 

LOD = -^(logioe)LRT= 0.21 7LRT (3) 

where the LRTstatisticwascalculated through the com- 
parison of Equations (5) and (6). Chromosome-wide 
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and genome-wide critical threshold values for QTL de- 
tection were approximated using Piepho's method. 
If a QTL was detected, the confide nee i nterva I was ca Icu- 
lated using Li's method. 

Association analysis was performed as a complement 
approach to QTL mapping, which was implemented 
using an R package called 'GWAF'^° that tests genetic as- 
sociation between genotypes and traits by fitting a 
linear model accounting for within full-sib pedigree 
correlation. The additive model in GWAF was selected 
by use of the kinship coefficient matrix in the analysis 
to avoid false positives owing to unexpected familial 
correlation. The functions 'Ime.batch' and 'gee.lgst.- 
batch' were used to perform a global test (i.e. Wald 
X' test) of genotypic effects with growth traits and sex, 
respectively. Bonferroni correction was carried out to 
account for multiple comparisons. 

2.7 . Integration of the linkage map, genomic 
scaffolds and a physical map 
To associate genomic scaffolds with the linkage 
map, all markers in the linkage map were mapped 
onto the scaffolds using the SOAP2 software^^ (para- 
meters -M 4, -V 2). Marker-associated scaffolds were 
further used as linkers to integrate the linkage map 
with a BAC-based physical map previously constructed 
for C. farreri.^^ These scaffold sequences were com- 
pared against available BAC-end sequences (BESs) 
using BlastN with an £-value cut-off of 1 £- 1 0. A recip- 
rocal comparison was performed to ensure one-to-one 
matches between the scaffolds and BAG contigs. 

3. Results 

3.1 . Genome survey sequencing ofC. farreri 

ThegenomesizeofC./flrren is relatively large and was 
previously estimated as ~1 .2 Gb.^° Genome survey se- 
quencing was performed for a si ngleC./flrrm individual 
by sequencing three paired-end DNA libraries with 
insert sizes of 1 75, 500 and 800 bp based on the 
lllumina HiSeq2000 platform. In total, 62.4 Gb of 
sequencing data, equivalent to ~52 x genome cover- 
age, were produced (Table 1 ). K-mer analysis revealed 
remarkably high heterozygosity (~1.4%) in the 



Table 1. Summary of genome survey sequencing data for C./orren 



Paired-end 
libraries 


Read 

lengtli 

(bp) 


Insert 
size (bp) 


Total 
data (G) 


Sequencing 
depth (X) 


LI 


1 00 


1 80 


32.7 


27.24 


L2 


100 


500 


21 .4 


1 7.82 


L3 


100 


800 


8.4 


6.98 


Total 






62.5 


52.04 



C. farreri genome (Fig. 1 ). Owing to the high genome 
heterozygosity, assembly of these data produced a 
large number of small scaffolds (N50 = 1 .5 kb), with 
the longest one only reached to 46 kb (Table 2). All 
the scaffold sequences are accessible at http://ipl.ouc. 
edu.cn/fuxiaoteng/cf_SRA_data. Of 20 056 isogroups 
in the C. farreri transcriptome,^ 18 316 (91.3%) 
could match with 1 6 489 genomic scaffolds, suggest- 
ingthatthe majority of genes were present in our refer- 
ence genome dataset. Of the mapped transcriptomic 
isogroups, 88.4% had been functionally annotated 
through the homology-based comparison against 
public databases. Summing up all scaffolds resulted in 
a total length of ~870 Mb, corresponding to ~70% of 
genome coverage; though not complete, it constitutes 
the most informative reference genome currently avail- 
able for C. farreri. 

3.2. Optimization of a 2b-RAD sequencing plan 
for linkage mapping 
Because of the large genome size (~ 1.2 Gb) of 
C. farreri, it is necessary to devise a cost-effective se- 
quencing plan that can balance the total sequencing 
cost and genotyping accuracy. In total, 21 0 570 BsaXI 
sites were identified from the reference genome 
dataset. Based on the genome coverage (~70%) of 
this dataset, the total number of BsaXI sites in the 
C. farreri genome was estimated as ~0.3 million. 
Sequencing all of the BsaXI sites to 20 x for a large 
mapping population (e.g. hundreds of individuals) 
would require a large sequencing investment. A 
notable feature of the 2b-RAD method is its ability to 
fine tune marker density as needed by constructing 
RR libraries using less-degenerate adaptors.^ For 
example, an RR library constructed using adaptors 
with 5'-NNT-3' overhangs only targets ~1 /I 0th of all 
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Depth (X) 

Figure 1. 1 7-mer frequency distribution of sequencing reads. All 
1 7-mer sequences were extracted from high-quality, paired-end 
reads generated by sequencing three genomic libraries with 
different insert sizes. The frequency of each 1 7-mer was calculated 
and plotted. Two peaks rather than one were observed, indicating 
high genome heterozygosity. 
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Table 2. Statistics of genome assembly for C. farreri 



/s.-Iilfc:r — O / 


l^Ul 1 LI] 




Scaffold 


Size (bp) 


Number 


Size (bp) 


Number 


N90 


330 


708,971 


395 


595,809 


N80 


503 


497,805 


603 


420,096 


N70 


713 


353,71 5 


851 


298,664 


N60 


960 


249,1 75 


1 1 A 9 
1 1 4- Z 


Z 1 U,U U 1 


N50 


1 267 


1 70,707 


1 C 1 7 
1 D 1 / 


1 /I Q C.C^C 


Average length 


807 




958 




Longest 


36,486 




46,1 76 




Total length 


863,450,239 




870,802,763 




Total number 


1,069,270 




908,703 




Length 


1,069,270 




908,703 




>1 00 bp 










Length >2 kb 


82,556 




94,956 





the BsaXI sites in the C./orrer/ genome, and sequencing 
of such a library to 20 x would require only 0.6 million 
reads per individual (in contrast to 6 million reads per 
individualfora standard BsaXI library), thusdramatical- 
ly reducing the total sequencing cost. 

3.3. 2b-RAD sequencing of a C. farreri mapping 
population 

Based on the calculations above, RR libraries were 
constructed for a C. farreri mapping population, se- 
quencing of which produced 0.6-1.2 million high- 
quality reads per progeny. For the parents, standard li- 
braries were constructed and sequenced to a sufficient 
depth (70-80x). As shown in Fig. 2, target sites of the 
RR libraries could be reproducibly detected among dif- 
ferent progeniesand theiraverage sequencingcoverage 
was ~6-fold higher than that of standard libraries. The 
sequencing depth for progenies ranged from 1 3.4 to 
23.8 with an average of 16.75. Clustering parental 
reads resulted in 1 81 625 representative reference 
sites consisting of 138 141 parent-shared sites and 
43 484 parent-specific sites. After filtering low-quality 
sites, 117 113 parent-shared and 35 799 parent-spe- 
cific sites remained. These reference sites contained 
92% of the unique BsaXI sites inferred from the scaf- 
folds, suggesting that unique sites were well repre- 
sented in the high-quality reference sites. 

3.4. Higli-resolution linl<age mapping 

In total, 7458 polymorphic markers were identified, 
of which 6842 were heterozygous in at least one 
parent, including 2196 co-dominant and 4646 
dominant markers. Dominant markers showing 3: 1 
segregation pattern in progenies were not included 
in subsequent analysis due to their inability to 
construct sex-specific maps. Totally, 4881 markers 



that conformed to the expected Mendelian ratios 
(P>0.05) were included in the linkage analysis. 
These markers were partitioned into maternal (1 : 1 
female type; 2473 markers) and paternal (1 : 1 male 
type; 2175 markers) datasets for constructing sex- 
specific linkage maps. At the LOD threshold of 6.0, the 
markers were grouped into 1 9 linkage groups, corre- 
sponding to the haploid chromosome number of C. 
farreri.^^ The female map contained 2 02 5 markers 
and spanned 1 1 75.4 cM with an average marker inter- 
val of 0.59 cM, whereas the male map contained 1 861 
markers and spanned 1 1 54.9 cM with an average 
marker interval of 0.62 cM. Significant differences in 
recombination rates between the female and male 
maps were observed for linkage groups LG5, LG8, 
LGIO and LGl 5 (Table 3). LG5 and LGIO showed 
higher recombination rates in females than in males, 
whereas the contrary was found for LG8 and LGl 5. 

The sex-specific maps were further integrated using 
anchor markers that were heterozygous in both 
parents. The consensus map contained 3806 markers 
(Fig. 3, Table 3) and spanned 1 543.4 cM with an 
average marker interval of 0.41 cM. The length of 
each linkage group ranged from 54.9 to 99.6 cM, and 
the marker density varied from 1 23 to 2 74 across the 
linkage groups. Mapping all the markers to the anno- 
tated genomic scaffolds revealed 623 markers (18- 
53 per linkage group; Table 3) that resided in or close 
to genes. These markers represent a valuable resource 
for future comparative genome analysis. 

Genome length was estimated to be 1 544.2 cM 
(Gel) and 1 559.5 cM (Gei) using two different 
methods.^^'^"^ The average of the two estimates 
(1 551.9 cM) was used as the expected genome 
length. Genome coverage of the consensus map was 
nearly complete and reached 99.5%. Given the esti- 
mated genome size of ~1.2 Gb for C. farreri,^° the 
average recombination rate across all the linkage 
groups was ~1 .3 cM/Mb. 

3.5. QTL mapping and association analyses 
of growth traits and sex 
Pairwise comparisons among five growth traits using 
Pearson's correlation revealed that all of them were 
highly correlated with correlation coefficients ranging 
from 0.77 to 0.96 (Table 4). The strongest correlation 
was observed between shell length and shell height 
(r=0.96, P=l£-10). A genome scan for these 
growth traits showed that they exhibited quite similar 
LOD profiles (Fig. 4), indicating that these traits 
were possibly regulated by the same set of genes. 
Therefore, a principle component analysis was con- 
ducted, which revealed that the first principal compo- 
nent (PCI) explained ~90% of the total phenotypic 
variation (Fig. 5). To increase the QTL detection 
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Figure 2. Distribution of sequencing coverage for target sites derived from 365 randomly chosen C. farreri scaffolds (concatenated). For 
parents, sequencing data were generated from standard BsaXI libraries, whereas for progenies, RR libraries were sequenced. Target sites 
of the RR libraries could be reproducibly sequenced among progenies and their sequencing coverage was ~6-fold higher than for 
standard libraries. 



power, PCI was utilized as a composite trait for QTL 
mapping. Two significant QTLs were detected on LGl 
and LG3 (Fig. 6a and Table 5) and the one on LG3 
that passed the genome-wide significance threshold 
showed a stronger QTL signal. The two QTLs explained 
1 1 .4 and 1 6.9% of the total phenotypic variation, re- 
spectively. Given the high marker density in the 
linkage map, association analysis was also performed 
to serve as a complementary approach to evaluate the 
QTL mapping results. Association analysis revealed a 
similar distribution pattern across all linkage groups 
as in QTL mapping analysis (Fig. 6a) and supported 
QTL mapping results in general. Within the confidence 
interval of the QTL detected on LG3, one marker 
named f68558 was found to be located in the 



intron region of a transcription factor gene, PROPl 
(Homeobox protein prophet of Pit-1) (Fig. 7 and 
Supplementary Table SI). It has been shown that 
PROPl can regulate the production of growth 
hormone,^^ a peptide that simulates growth, cell 
reproduction and regeneration in animals. Identi- 
fication of PROPl within the QTL region suggests that 
this gene may represent an important candidate QTL 
gene that is worthy of further investigation. 

Whileforsex,ahighlysignificantQTLwasfinely mapped 
on LGl 1 with the confidence interval of 0.37 cM 
(34.61 -34.98 cM; Fig. 6b). Moreover, no global differ- 
ence (female : male = 1.01) in the recombination rate 
was observed for this linkage group between the two 
sex-specific maps (Table 3). Association analysis revealed 
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Table 3. Summary of the consensus linkage map in C. farreri 



Linkage group 


Number of 
ma rkers 


Length 


Average marker 
interval (cM) 


Female/male 
recombination rate 


Number of 
annotated markers 


1 


274 


96.09 


0.35 


0.97 


41 


2 


272 


88.62 


0.33 


0.81 


43 


3 


271 


95.36 


0.35 


1 .04 


37 


4 


241 


75.95 


0.32 


1.00 


39 


5 


235 


71.48 


0.31 


1.37 


44 


6 


230 


99.62 


0.44 


1.24 


53 


7 


223 


77.77 


0.35 


1.00 


42 


8 


207 


68.57 


0.33 


0.70 


35 


9 


203 


91.94 


0.46 


1 .30 


31 


10 


199 


54.99 


0.28 


2.25 


30 


1 1 


1 89 


96.59 


0.51 


0.97 


28 


1 2 


1 83 


92.82 


0.51 


0.87 


26 


1 3 


1 72 


81.04 


0.47 


1.14 


32 


1 4 


1 67 


65.84 


0.40 


1 .34 


1 8 


1 5 


1 66 


78.2 


0.47 


0.62 


33 


16 


1 62 


84.01 


0.52 


0.94 


27 


1 7 


147 


81 


0.55 


0.82 


23 


1 8 


142 


64.36 


0.46 


1.07 


21 


19 


1 23 


79.1 1 


0.65 


1.00 


20 


All 


3,806 


1 543.36 


0.41 


1.01 


623 



2 7 sex-related markers with high statistical significance 
(P< l£-6; Supplementary Table S2) and all of them 
fell into the narrow QTL region identified by the QTL 
mapping analysis. Interestingly, one sex-related marker 
named f9 342 2 was located in the coding region of a tran- 
scription factor gene, ZNFXl (NFXl-type zinc finger- 
containing 1), which has been shown to be tightly 
linked with AMI-IR2 (anti-Mullerian hormone receptor 
type ll),the sex-determination gene in the tiger pufferfish, 
Takifugu rubripes.^^ 

3.6. Integration of the linkage map, genomic scaffolds 
and a physical map 
A high-resolution linkage map can serve as an im- 
portant platform for integrating multiple genomic 
resources. One such application is to improve genome 
assembly by orienting genomic scaffolds, ^or C. farreri, 
a majority of larger genomic scaffolds contained at 
least one BsaXI site (77% for scaffolds of > 5kb; 92% 
for scaffolds of >1 0 kb) and therefore, it could be po- 
tentially oriented using a high-density linkage map. Of 
the 3806 markers in the consensus linkage map, 
21 74 could be unambiguously anchored to scaffolds. 
These scaffolds were not only useful for directing a 
higher-level genome assembly, but also could serve as 
'long' surrogates for 2b-RAD tags to enhance their 
utility in unifying genomic resources. To demonstrate 



this potential, we attempted to integrate the linkage 
map with a BAC-based physical map previously con- 
structed for C. farreri. The physical map contained 
2514 BAG contigs that had BESs; however, a large frac- 
tion of them had only a few BESs (e.g. 73% with less 
than three sequenced BAG clones), thus limiting the in- 
tegration efficiency. Even so, 681 BAG contigs could still 
be linked with the linkage map through sequence com- 
parison between BESs and marker-associated scaffolds 
(Supplementary dataset). An example showing the in- 
tegration of LGl and BAG contigs is present in Fig. 8. 



4. Discussion 

4. 1 . Genome sequencing in bivalves 

In spite of species abundance and diverse geographic- 
al distribution, little effort has been devoted to decod- 
ing bivalve genomes. To date, whole-genome 
sequencing has been performed for only two bivalve 
species, i.e. Pearl oyster {Pinctada fucata)^^ and Pacific 
oyster {Crassostrea gigas)^^ providing the first glimpse 
into bivalve genome architecture. Apparently, genome 
sequencing of a broader range of bivalves would 
enable a better understanding of their phylogeny, spe- 
ciation and diversification. Our study represents the 
first step towards fully decoding a scallop genome. 
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ure 3. The high-density consensus linloge map oi C. farreri, which contained 3806 marl<ers in 1 9 linkage groups. 
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Table 4. Pearson's correlation coefficients among five growth traits 
in C.farreri 





Shell 
height 


Shell 
length 


Shell 
width 


Body 
weight 


Muscle 
weight 


Shell 
height 


1.00 


0.96 


0.85 


0.94 


0.84 


Shell 
length 




1.00 


0.83 


0.93 


0.81 


Shell 
width 






1.00 


0.82 


0.77 


Body 
weight 








1.00 


0.93 


Muscle 










1.00 



weight 



Genome analysis revealed that C. farreri possesses a 
highly heterozygous genome. This finding is not 
unusual and has been reported in other bivalve 
species.^^'^^ For example, it has been shown that, for 
the Pacific oyster, average density of SNPs is one SNP 
per 60 bp in coding regions and one per 40 bp in 
non-coding regions.^^ High genome heterozygosity 
poses the challenge of efficient genome assembly. In 
such instance, a high-resolution linkage map would 
be much valuable in that they can orientate the small 
scaffolds to improve genome assembly. 

4.2. Devising an optimal 2b-RAD sequencing plan 
for linkage mapping 
Large-scale linkage mapping studies generally call 
for a cost-effective sequencing plan that reasonably bal- 
ances the sequencing cost and genotyping accuracy. 
Through simulation analysis, a sequencing depth of 
20 X for both parents and progenies can provide suffi- 
ciently high genotyping accuracy (>96%; unpublished 
data). Determining the amount of sequencing neces- 
sary for a desired sequencing depth would require the 
knowledge of a total number of restriction sites in a 
given genome. Without a reference genome, the total 
number of restriction sites can be simply predicted 
based on genome size and GC content by assuming 
sites that are Poisson distributed. However, it has been 
shown that such estimation departed considerably 
from the actual numbers in some instances.' Our 
study demonstrates that genome survey sequencing 
can facilitate devising an optimal 2b-RAD sequencing 
plan by providing sufficient genome information to de- 
termine the minimal amount of sequencing required 
for a given sequencing depth. Meanwhile, the obtained 
genome information can also be used to enhance de 
novo 2b-RAD analysis in many other aspects, such as 
evaluation of the efficiency and reliabilityoft/e not'o ref- 
erence site reconstruction, identification of genes sur- 
rounding 2b-RAD tags and association of 2b-RAD tags 
with public genomic resources. 



For species with very large genomes, sequencing all 
BsaXI sites at a depth of 20 x for all individuals 
remains a substantial investment. For example, sequen- 
cing 200 human individuals with the genome size of 
~3 Gb would require 5.2 billion reads, which are 
approximately equivalent to the number of reads 
produced from >4 full sequencing runs using the 
HiSeq2000 platform. One advantage of 2b-RAD meth- 
odology is the option to adjust marker density to the 
desired level using selective adaptors. Sequencing cost 
can be further reduced by sequencing a subset of 
BsaXI sites deriving from RR libraries. We have demon- 
strated that sequencing a C. farreri RR library only 
required about one-sixth of the sequencing cost for a 
standard library at a given sequencing coverage, thus 
reducing the total sequencing cost dramatically. 
Therefore, sequencing RR libraries represents an effect- 
ive option for large-scale linkage mapping studies 
dealing with species with large genomes. 

4.3. Linkage map construction and QTL analysis 

Thus far, linkage maps have been constructed for 
many bivalve species including C. farreri.^ 
However,these maps were usually built using hundreds 
of markers and the resolution was generally low (mostly 
10-20cM), limiting their use in fine-scale QTL 
mapping and many other applications. In this study, 
the linkage map constructed for C. farreri contained 
3806 markers, a marker density that has, to our knowl- 
edge, never been reached for any other bivalve species. 
This linkage map covered nearly the whole genome 
(99.5%) with a resolution of 0.41 cM. In addition, 
~1 0% of the markers in the linkage map residing in or 
close to genes as revealed by mapping them to the 
annotated genomic scaffolds. Our map, therefore, pro- 
vides sufficient resolution for fine-scale QTL mapping 
and facilitates the discovery of quantitative trait genes. 

Growth traits are of particular interest to scallop 
researchers due to their high commercial significance 
in scallop aquaculture. QTL mapping represents an effi- 
cient approach to identify genetic loci underlying these 
traits for marker-assisted selection in genetic breeding. 
Two growth-related QTLs were detected on the linkage 
groups LGl and LG3, which are also supported by 
the association analysis, a complementary approach 
to evaluate the QTL mapping results. The markers 
located at the confidence i ntervals of these QTLs consti- 
tute a valuable markersetforfurtherevaluation of their 
utility in marker-assisted selection. In particular, the 
homeotic gene PROP1 , which was identified from the 
QTL region on LG3, drew our great interest. This gene 
encodes a paired-like homeodomain protein that is ne- 
cessary for P/T7 (growth hormone factor 1 ) expression. 
In mammals, mutations in this gene can result in 
impaired production of growth hormone and other 
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Figure 4. A genome scan of LOD profiles for five growth traits in C./flrren\ All traits exhibited quite similar LOD distributions. SH, shell height;^ 
AMW, adductor muscle weight. The dark and light dashed lines indicated the genome-wide and chromosome-wide significance thresholds. 
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Principal Component Component 1 

Figures. Principal component analysis offivegrowtli traits in C./orrer/. The PCI explained ~ 90% of the total variance. SH, shell height; SL, shell 
length; SW, shell width; BW, body weight; AMW, adductor muscle weight. 




12- 



Figure 6. QTL mapping and association analysis of growth traits (a) and sex (b) in C. farreri. The PCI for five growth traits was used in QTL 
mapping and association analyses. The solid and dashed lines indicate the genome-wide and chromosome-wide significance thresholds, 
respectively. 



pituitary hornnones.^^ Polymorphisms in thisgene have 
been associated with production traits in livestock 
animals, such as cattle,'*" goats '**''*^ and sheep.'*^ The 
marl<erf68558 is probably not the causal locus that is 
responsible for growth trait variation in C. farreri. 
Instead, we think it is more likely that f68558 is in 
tight linkage disequilibrium with the causal mutation 
that is possibly located in the regulatory region (e.g. pro- 
moter) of PR0P1 . The expression level of PROPl has 
been checked in major adult tissues (e.g. adductor 
muscle, digestive gland, gill, mantle, male and female 



gonads and kidney) of C. farreri. It turned out that 
PROPl was not expressed in neither of these tissues 
(data not shown), suggesting that scallop PROPl is 
probably expressed and functioning during early 
embryonic development just like its homologues in 
vertebrates. In vertebrates, PROP1 is only temporarily 
expressed during early pituitary development.'*'^ 
However, determining the temporal and spatial expres- 
sion of PROPl in bivalves is not straightforward, since 
the counterpart of pituitary has not been established 
in bivalves. Further work would be needed to overcome 
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these barriers to evaluate the role(s) of PROP 1 in scallop 
growth regulation. 

Unlike many other animals, scallops do not possess 
recognizable heteromorphic sex chromosomes,"*^ and 
little is currently known about their sex-determin- 
ation mechanism. The use of molecular markers in 
mapping experiments to identify regions involved 
in sex determination represents an efficient approach 
to study sex-determination systems in species without 
heteromorphic sex chromosomes. In this study, 
both QTL mapping and association analysis revealed 
a highly significant sex-related QTL on LGl 1 , suggest- 
ing the presence of a narrow sex-determination 
region (34.61 -34.98 cM) in C. farreri. No global dif- 
ferences in recombination rates were detected for 
LGl 1 between the two sexes, which may account 
for the indistinguishable morphologies of sex chro- 
mosomes in C. farreri. Interestingly, we found that 
one sex-related marker (i.e. f9342 2) was situated at 
the transcription factor gene, ZNFX1 , which has 
been shown to be tightly linked with the sex-deter- 
mination gene AMHR2 in the tiger pufferfish.^^ 
Unfortunately, we do not have direct evidence to 
support the physical linkage between ZNFXl and 
AMHR2 based on the available C. farreri genome 
information. Furthermore, searching another two 
molluscan genomes, i.e. the Pacific oyster C. gigas 
(http://gigadb.org/pacific_oyster) and the owl limpet 



Tables. QTL mapping of the PCI of five growth traits in C. /orreri 



Trait LG 


Confidence 


Closest marker 


PV 


LOD 




interval (cM) 


(position) 


(%r 




PCI 1 


55.1 3-70.68 


dm28438 (60.34), 
df12539 (61.1 0) 


1 1.38 


2.52 


3 


34.14-61.82 


m7430 (48.70), 
f1 7843 (48.72) 


16.89 


3.76 



^Phenotypic variance (PV) explained by a QTL 



Lottia gigantea (http://metazoa.ensembl.org/Lottia_ 
gigantea), did not support the two genes' linkage 
either, though this does not necessarily exclude 
the possibility of their physical linkage because 
these genome assemblies are still largely fragmented 
(scaffold N50 is 401 kb for C. gigas^^ and 1.87 Mb 
for L. gigantean'^^). Our group has initiated the whole- 
genome sequencing project for C. farreri; therefore, 
this issue will be settled in the near future. 



4.4. Integration oftlie linkage map with multiple 
genomic resources 
Recent studies have shown that RAD-based linkage 
maps can greatly improve large-scale genome assem- 
[^ly 47,48 However, RAD tags are usually short in length 
(35-1 00 bp), thus limitingtheirdirect use for integra- 
tion with other important genomic resources gener- 
ated from RNA-Seq, ChlP-Seq, exome sequencing 
or BAC-fosmid-end sequencing. Here, we propose 
that RAD tags can be effectivelyextended usingthescaf- 
folds obtained from genome survey sequencing. For 
C. farreri, although a majority of the scaffolds were 
not very long (scaffold N50 = 1.5kb), they could 
already serve as 'long' surrogates for 2b-RAD tags to 
enhance their utility. We have demonstrated that 
this approach could facilitate the integration of the 
scallop linkage map with the BAC-based physical 
map, even though a large fraction of BAG contigs con- 
tained only a few BESs (e.g. 73% with less than three 
sequenced BAG clones). Besides, such approach can 
also facilitate the identification of genes surrounding 
a given tag, which can be crucial for the identification 
of quantitative trait genes in QTL mapping and asso- 
ciation studies. Therefore, it can be envisioned that 
this scallop linkage map would serve as an important 
platform for unifying genomic resources that have 
been accumulated for C. farreri. 
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Figure 7. The gene structures of PROP/ (Homeobox protein prophet of Pit- 1 ) in two sea I lop species. For P yessoe/is/s, the full length of PROP? was 
obtained from an ongoing genome sequencing project. For C. farreri, homologous genomic scaffolds were shown. The marker f68558 
found in the confidence interval of a growth-related QTL detected on LG3 was located at the intron region of this transcription factor 
gene. The marker sequences were shown for the two species with the polymorphic locus in C. farreri indicated. 
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Figure 8. Schematic demonstration of the integration of linkage 
group LGl , genomic scaffolds and a BAC-based physical map. 



4.5. Conclusions 

We generated a preliminary reference genome for a 
bivalve mollusc, expanding the genome resources cur- 
rently available for bivalves— a group of animals that 
remain largely unexplored in termsof genome sequen- 
cing. A high-resolution linkage map was constructed for 
C. farreri \N\th a marker density that has, to our knowl- 
edge, never been achieved in any other molluscs. Two 
growth -related QTLs and one putative sex-determin- 
ation region were detected, and candidate genes iden- 
tified from these QTL regions represent important 
targets for further evaluation. Integration of the 
genome map, linkage map and physical map exempli- 
fies how to build up a comprehensive genomic frame- 
work in a non-model organism. 
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